Online Meta-learning by Parallel Algorithm Competition

نویسندگان

  • Stefan Elfwing
  • Eiji Uchibe
  • Kenji Doya
چکیده

The efficiency of reinforcement learning algorithms depends critically on a few metaparameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-theart results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10×10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa(λ) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving a New Multi-objective Unrelated Parallel Machines Scheduling Problem by Hybrid Teaching-learning Based Optimization

This paper considers a scheduling problem of a set of independent jobs on unrelated parallel machines (UPMs) that minimizesthe maximum completion time (i.e., makespan or ), maximum earliness ( ), and maximum tardiness ( ) simultaneously. Jobs have non-identical due dates, sequence-dependent setup times and machine-dependentprocessing times. A multi-objective mixed-integer linear programmi...

متن کامل

Two meta-heuristic algorithms for parallel machines scheduling problem with past-sequence-dependent setup times and effects of deterioration and learning

This paper considers identical parallel machines scheduling problem with past-sequence-dependent setup times, deteriorating jobs and learning effects, in which the actual processing time of a job on each machine is given as a function of the processing times of the jobs already processed and its scheduled position on the corresponding machine. In addition, the setup time of a job on each machin...

متن کامل

A New Hybrid Meta-Heuristics Approach to Solve the Parallel Machine Scheduling Problem Considering Human Resiliency Engineering

This paper proposes a mixed integer programming model to solve a non-identical parallel machine (NIPM) scheduling with sequence-dependent set-up times and human resiliency engineering. The presented mathematical model is formulated to consider human factors including Learning, Teamwork and Awareness. Moreover, processing time of jobs are assumed to be non-deterministic and dependent to their st...

متن کامل

A Hybrid Meta-Heuristic Algorithm based on Imperialist Competition Algorithm

The human has always been to find the best in all things. This Perfectionism has led to the creation of optimization methods. The goal of optimization is to determine the variables and find the best acceptable answer Due to the limitations of the problem, So that the objective function is minimum or maximum. One of the ways inaccurate optimization is meta-heuristics so that Inspired by nature, ...

متن کامل

Testing Soccer League Competition Algorithm in Comparison with Ten Popular Meta-heuristic Algorithms for Sizing Optimization of Truss Structures

Recently, many meta-heuristic algorithms are proposed for optimization of various problems. Some of them originally are presented for continuous optimization problems and some others are just applicable for discrete ones. In the literature, sizing optimization of truss structures is one of the discrete optimization problems which is solved by many meta-heuristic algorithms. In this paper, in or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1702.07490  شماره 

صفحات  -

تاریخ انتشار 2017